Overview

Dataset Statistics

Number of Variables 46
Number of Rows 3376
Missing Cells 19952
Missing Cells (%) 12.8%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 4.0 MB
Average Row Size in Memory 1.2 KB
Variable Types
  • Numerical: 27
  • Categorical: 19

Dataset Insights

PropertyGFATotal and PropertyGFABuilding(s) have similar distributions Similar Distribution
SiteEUI(kBtu/sf) and SiteEUIWN(kBtu/sf) have similar distributions Similar Distribution
SourceEUI(kBtu/sf) and SourceEUIWN(kBtu/sf) have similar distributions Similar Distribution
SiteEnergyUse(kBtu) and SiteEnergyUseWN(kBtu) have similar distributions Similar Distribution
SecondLargestPropertyUseType has 1697 (50.27%) missing values Missing
SecondLargestPropertyUseTypeGFA has 1697 (50.27%) missing values Missing
ThirdLargestPropertyUseType has 2780 (82.35%) missing values Missing
ThirdLargestPropertyUseTypeGFA has 2780 (82.35%) missing values Missing
YearsENERGYSTARCertified has 3257 (96.48%) missing values Missing
ENERGYSTARScore has 843 (24.97%) missing values Missing
Comments has 3376 (100.0%) missing values Missing
Outlier has 3344 (99.05%) missing values Missing
OSEBuildingID is skewed Skewed
ZipCode is skewed Skewed
Latitude is skewed Skewed
NumberofBuildings is skewed Skewed
NumberofFloors is skewed Skewed
PropertyGFATotal is skewed Skewed
PropertyGFAParking is skewed Skewed
PropertyGFABuilding(s) is skewed Skewed
LargestPropertyUseTypeGFA is skewed Skewed
SecondLargestPropertyUseTypeGFA is skewed Skewed
ThirdLargestPropertyUseTypeGFA is skewed Skewed
SiteEUI(kBtu/sf) is skewed Skewed
SiteEUIWN(kBtu/sf) is skewed Skewed
SourceEUI(kBtu/sf) is skewed Skewed
SourceEUIWN(kBtu/sf) is skewed Skewed
SiteEnergyUse(kBtu) is skewed Skewed
SiteEnergyUseWN(kBtu) is skewed Skewed
SteamUse(kBtu) is skewed Skewed
Electricity(kWh) is skewed Skewed
Electricity(kBtu) is skewed Skewed
NaturalGas(therms) is skewed Skewed
NaturalGas(kBtu) is skewed Skewed
TotalGHGEmissions is skewed Skewed
GHGEmissionsIntensity is skewed Skewed
PropertyName has a high cardinality: 3362 distinct values High Cardinality
Address has a high cardinality: 3354 distinct values High Cardinality
TaxParcelIdentificationNumber has a high cardinality: 3268 distinct values High Cardinality
ListOfAllPropertyUseTypes has a high cardinality: 466 distinct values High Cardinality
LargestPropertyUseType has a high cardinality: 56 distinct values High Cardinality
YearsENERGYSTARCertified has a high cardinality: 65 distinct values High Cardinality
DataYear has constant value "2016" Constant
City has constant value "Seattle" Constant
State has constant value "WA" Constant
DataYear has constant length 4 Constant Length
City has constant length 7 Constant Length
State has constant length 2 Constant Length
CouncilDistrictCode has constant length 1 Constant Length
Comments has all distinct values Unique
Longitude has 3376 (100.0%) negatives Negatives
PropertyGFAParking has 2872 (85.07%) zeros Zeros
SteamUse(kBtu) has 3237 (95.88%) zeros Zeros
NaturalGas(therms) has 1258 (37.26%) zeros Zeros
NaturalGas(kBtu) has 1258 (37.26%) zeros Zeros
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

Variables


OSEBuildingID

numerical

Approximate Distinct Count 3376
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean 21208.9911
Minimum 1
Maximum 50226
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • OSEBuildingID is skewed left (γ1 = -0.0083)

Quantile Statistics

Minimum 1
5-th Percentile 275.5
Q1 19990.75
Median 23112
Q3 25994.25
95-th Percentile 49784.25
Maximum 50226
Range 50225
IQR 6003.5

Descriptive Statistics

Mean 21208.9911
Standard Deviation 12223.757
Variance 1.4942e+08
Sum 7.1602e+07
Skewness -0.008275
Kurtosis 0.6481
Coefficient of Variation 0.5763
  • OSEBuildingID is not normally distributed (p-value 1.2082653104760572e-08)
  • OSEBuildingID has 898 outliers

DataYear

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 232944

Length

Mean 4
Standard Deviation 0
Median 4
Minimum 4
Maximum 4

Sample

1st row 2016
2nd row 2016
3rd row 2016
4th row 2016
5th row 2016

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 13504
  • DataYear has words of constant length

BuildingType

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 277397

Length

Mean 17.1674
Standard Deviation 3.0505
Median 20
Minimum 6
Maximum 20

Sample

1st row NonResidential
2nd row NonResidential
3rd row NonResidential
4th row NonResidential
5th row NonResidential

Letter

Count 45425
Lowercase Letter 36524
Space Separator 3600
Uppercase Letter 8901
Dash Punctuation 1794
Decimal Number 3612
  • The top 2 categories (NonResidential, Multifamily LR (1-4)) take over 50.0%

PrimaryPropertyType

categorical

Approximate Distinct Count 24
Approximate Unique (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory Size 277471
  • The largest value (Low-Rise Multifamily) is over 1.75 times larger than the second largest value (Mid-Rise Multifamily)

Length

Mean 17.1893
Standard Deviation 6.1164
Median 20
Minimum 5
Maximum 27

Sample

1st row Hotel
2nd row Hotel
3rd row Hotel
4th row Hotel
5th row Hotel

Letter

Count 51664
Lowercase Letter 43004
Space Separator 3640
Uppercase Letter 8660
Dash Punctuation 2409
Decimal Number 278
  • The largest value (multifamily) is over 1.68 times larger than the second largest value (lowrise)

PropertyName

categorical

Approximate Distinct Count 3362
Approximate Unique (%) 99.6%
Missing 0
Missing (%) 0.0%
Memory Size 284935

Length

Mean 19.4002
Standard Deviation 8.2587
Median 18
Minimum 2
Maximum 72

Sample

1st row Mayflower park hot...
2nd row Paramount Hotel
3rd row 5673-The Westin Se...
4th row HOTEL MAX
5th row WARWICK SEATTLE HO...

Letter

Count 55182
Lowercase Letter 42135
Space Separator 6431
Uppercase Letter 13047
Dash Punctuation 300
Decimal Number 2807
  • PropertyName contains many words: 3141 words

Address

categorical

Approximate Distinct Count 3354
Approximate Unique (%) 99.4%
Missing 0
Missing (%) 0.0%
Memory Size 277642

Length

Mean 17.2399
Standard Deviation 3.7735
Median 17
Minimum 8
Maximum 41

Sample

1st row 405 Olive way
2nd row 724 Pine street
3rd row 1900 5th Avenue
4th row 620 STEWART ST
5th row 401 LENORA ST

Letter

Count 32736
Lowercase Letter 21859
Space Separator 9521
Uppercase Letter 10877
Dash Punctuation 49
Decimal Number 15091
  • Address contains many words: 2296 words
  • The largest value (ave) is over 2.73 times larger than the second largest value (st)

City

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 243072

Length

Mean 7
Standard Deviation 0
Median 7
Minimum 7
Maximum 7

Sample

1st row Seattle
2nd row Seattle
3rd row Seattle
4th row Seattle
5th row Seattle

Letter

Count 23632
Lowercase Letter 20256
Space Separator 0
Uppercase Letter 3376
Dash Punctuation 0
Decimal Number 0
  • City has words of constant length

State

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 226192

Length

Mean 2
Standard Deviation 0
Median 2
Minimum 2
Maximum 2

Sample

1st row WA
2nd row WA
3rd row WA
4th row WA
5th row WA

Letter

Count 6752
Lowercase Letter 0
Space Separator 0
Uppercase Letter 6752
Dash Punctuation 0
Decimal Number 0
  • State has words of constant length

ZipCode

numerical

Approximate Distinct Count 55
Approximate Unique (%) 1.6%
Missing 16
Missing (%) 0.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 53760
Mean 98116.9491
Minimum 98006
Maximum 98272
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • ZipCode is skewed right (γ1 = 1.9988)

Quantile Statistics

Minimum 98006
5-th Percentile 98101
Q1 98104
Median 98115
Q3 98122
95-th Percentile 98144
Maximum 98272
Range 266
IQR 18

Descriptive Statistics

Mean 98116.9491
Standard Deviation 18.6152
Variance 346.5258
Sum 3.2967e+08
Skewness 1.9988
Kurtosis 10.4756
Coefficient of Variation 0.00018972
  • ZipCode is not normally distributed (p-value 4.481324314774726e-14)
  • ZipCode has 114 outliers

TaxParcelIdentificationNumber

categorical

Approximate Distinct Count 3268
Approximate Unique (%) 96.8%
Missing 0
Missing (%) 0.0%
Memory Size 253217
  • The largest value (1625049001) is over 1.6 times larger than the second largest value (0002400002)

Length

Mean 10.005
Standard Deviation 0.2604
Median 10
Minimum 9
Maximum 25

Sample

1st row 0659000030
2nd row 0659000220
3rd row 0659000475
4th row 0659000640
5th row 0659000970

Letter

Count 3
Lowercase Letter 3
Space Separator 2
Uppercase Letter 0
Dash Punctuation 2
Decimal Number 33770
  • TaxParcelIdentificationNumber contains many words: 3268 words
  • The largest value (1625049001) is over 1.6 times larger than the second largest value (0925049346)

CouncilDistrictCode

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 222816
  • The largest value (7) is over 1.74 times larger than the second largest value (3)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 7
2nd row 7
3rd row 7
4th row 7
5th row 7

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3376
  • The largest value (7) is over 1.74 times larger than the second largest value (3)
  • CouncilDistrictCode has words of constant length

Neighborhood

categorical

Approximate Distinct Count 19
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Memory Size 253585

Length

Mean 10.114
Standard Deviation 5.2085
Median 9
Minimum 4
Maximum 22

Sample

1st row DOWNTOWN
2nd row DOWNTOWN
3rd row DOWNTOWN
4th row DOWNTOWN
5th row DOWNTOWN

Letter

Count 31826
Lowercase Letter 488
Space Separator 1896
Uppercase Letter 31338
Dash Punctuation 0
Decimal Number 0

Latitude

numerical

Approximate Distinct Count 2876
Approximate Unique (%) 85.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean 47.624
Minimum 47.4992
Maximum 47.7339
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Latitude is skewed right (γ1 = 0.14)

Quantile Statistics

Minimum 47.4992
5-th Percentile 47.5417
Q1 47.5999
Median 47.6187
Q3 47.6571
95-th Percentile 47.713
Maximum 47.7339
Range 0.2347
IQR 0.05726

Descriptive Statistics

Mean 47.624
Standard Deviation 0.04776
Variance 0.002281
Sum 160778.7358
Skewness 0.14
Kurtosis -0.1427
Coefficient of Variation 0.001003
  • Latitude is not normally distributed (p-value 3.229414558895358e-08)
  • Latitude has 16 outliers

Longitude

numerical

Approximate Distinct Count 2656
Approximate Unique (%) 78.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean -122.3348
Minimum -122.4142
Maximum -122.221
Zeros 0
Zeros (%) 0.0%
Negatives 3376
Negatives (%) 100.0%
  • Longitude is skewed left (γ1 = -0.1375)

Quantile Statistics

Minimum -122.4142
5-th Percentile -122.3865
Q1 -122.3507
Median -122.3325
Q3 -122.3194
95-th Percentile -122.2898
Maximum -122.221
Range 0.1933
IQR 0.03126

Descriptive Statistics

Mean -122.3348
Standard Deviation 0.0272
Variance 0.00074002
Sum -413002.2686
Skewness -0.1375
Kurtosis 0.2602
Coefficient of Variation -0.00022237
  • Longitude is not normally distributed (p-value 9.157457703417057e-05)
  • Longitude has 92 outliers

YearBuilt

numerical

Approximate Distinct Count 113
Approximate Unique (%) 3.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean 1968.5732
Minimum 1900
Maximum 2015
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • YearBuilt is skewed left (γ1 = -0.5392)

Quantile Statistics

Minimum 1900
5-th Percentile 1908
Q1 1948
Median 1975
Q3 1997
95-th Percentile 2012
Maximum 2015
Range 115
IQR 49

Descriptive Statistics

Mean 1968.5732
Standard Deviation 33.0882
Variance 1094.8261
Sum 6.6459e+06
Skewness -0.5392
Kurtosis -0.8718
Coefficient of Variation 0.01681

NumberofBuildings

numerical

Approximate Distinct Count 17
Approximate Unique (%) 0.5%
Missing 8
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 53888
Mean 1.1069
Minimum 0
Maximum 111
Zeros 92
Zeros (%) 2.7%
Negatives 0
Negatives (%) 0.0%
  • NumberofBuildings is skewed right (γ1 = 43.3757)

Quantile Statistics

Minimum 0
5-th Percentile 1
Q1 1
Median 1
Q3 1
95-th Percentile 1
Maximum 111
Range 111
IQR 0

Descriptive Statistics

Mean 1.1069
Standard Deviation 2.1084
Variance 4.4454
Sum 3728
Skewness 43.3757
Kurtosis 2202.0219
Coefficient of Variation 1.9048
  • NumberofBuildings is not normally distributed (p-value 4.255933577254892e-25)
  • NumberofBuildings has 193 outliers

NumberofFloors

numerical

Approximate Distinct Count 50
Approximate Unique (%) 1.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean 4.7091
Minimum 0
Maximum 99
Zeros 16
Zeros (%) 0.5%
Negatives 0
Negatives (%) 0.0%
  • NumberofFloors is skewed right (γ1 = 5.9197)

Quantile Statistics

Minimum 0
5-th Percentile 1
Q1 2
Median 4
Q3 5
95-th Percentile 12
Maximum 99
Range 99
IQR 3

Descriptive Statistics

Mean 4.7091
Standard Deviation 5.4945
Variance 30.1891
Sum 15898
Skewness 5.9197
Kurtosis 55.866
Coefficient of Variation 1.1668
  • NumberofFloors is not normally distributed (p-value 1.2367423662453623e-15)
  • NumberofFloors has 240 outliers

PropertyGFATotal

numerical

Approximate Distinct Count 3195
Approximate Unique (%) 94.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean 94833.5373
Minimum 11285
Maximum 9320156
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PropertyGFATotal is skewed right (γ1 = 24.1187)

Quantile Statistics

Minimum 11285
5-th Percentile 21291.5
Q1 28487
Median 44175
Q3 90992
95-th Percentile 320096
Maximum 9320156
Range 9308871
IQR 62505

Descriptive Statistics

Mean 94833.5373
Standard Deviation 218837.6071
Variance 4.789e+10
Sum 3.2016e+08
Skewness 24.1187
Kurtosis 944.8369
Coefficient of Variation 2.3076
  • PropertyGFATotal is not normally distributed (p-value 5.704154744267337e-25)
  • PropertyGFATotal has 370 outliers

PropertyGFAParking

numerical

Approximate Distinct Count 496
Approximate Unique (%) 14.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean 8001.5261
Minimum 0
Maximum 512608
Zeros 2872
Zeros (%) 85.1%
Negatives 0
Negatives (%) 0.0%
  • PropertyGFAParking is skewed right (γ1 = 6.6482)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 46400.75
Maximum 512608
Range 512608
IQR 0

Descriptive Statistics

Mean 8001.5261
Standard Deviation 32326.7239
Variance 1.045e+09
Sum 2.7013e+07
Skewness 6.6482
Kurtosis 58.8858
Coefficient of Variation 4.0401
  • PropertyGFAParking is not normally distributed (p-value 4.655512975899382e-25)
  • PropertyGFAParking has 504 outliers

PropertyGFABuilding(s)

numerical

Approximate Distinct Count 3193
Approximate Unique (%) 94.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 54016
Mean 86832.0113
Minimum 3636
Maximum 9320156
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PropertyGFABuilding(s) is skewed right (γ1 = 27.6121)

Quantile Statistics

Minimum 3636
5-th Percentile 21021
Q1 27756
Median 43216
Q3 84276.25
95-th Percentile 282658.5
Maximum 9320156
Range 9316520
IQR 56520.25

Descriptive Statistics

Mean 86832.0113
Standard Deviation 207939.8119
Variance 4.3239e+10
Sum 2.9314e+08
Skewness 27.6121
Kurtosis 1159.6392
Coefficient of Variation 2.3947
  • PropertyGFABuilding(s) is not normally distributed (p-value 5.462680091303936e-25)
  • PropertyGFABuilding(s) has 345 outliers

ListOfAllPropertyUseTypes

categorical

Approximate Distinct Count 466
Approximate Unique (%) 13.8%
Missing 9
Missing (%) 0.3%
Memory Size 306176
  • The largest value (Multifamily Housing) is over 1.87 times larger than the second largest value (Multifamily Housing, Parking)

Length

Mean 25.9344
Standard Deviation 17.166
Median 20
Minimum 5
Maximum 255

Sample

1st row Hotel
2nd row Hotel, Parking, Re...
3rd row Hotel
4th row Hotel
5th row Hotel, Parking, Sw...

Letter

Count 76720
Lowercase Letter 66361
Space Separator 6537
Uppercase Letter 10359
Dash Punctuation 633
Decimal Number 294

LargestPropertyUseType

categorical

Approximate Distinct Count 56
Approximate Unique (%) 1.7%
Missing 20
Missing (%) 0.6%
Memory Size 272695
  • The largest value (Multifamily Housing) is over 3.35 times larger than the second largest value (Office)

Length

Mean 16.256
Standard Deviation 6.6956
Median 19
Minimum 5
Maximum 52

Sample

1st row Hotel
2nd row Hotel
3rd row Hotel
4th row Hotel
5th row Hotel

Letter

Count 50806
Lowercase Letter 44355
Space Separator 2796
Uppercase Letter 6451
Dash Punctuation 445
Decimal Number 278
  • The top 2 categories (Multifamily Housing, Office) take over 50.0%

LargestPropertyUseTypeGFA

numerical

Approximate Distinct Count 3122
Approximate Unique (%) 93.0%
Missing 20
Missing (%) 0.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 53696
Mean 79177.6386
Minimum 5656
Maximum 9.3202e+06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • LargestPropertyUseTypeGFA is skewed right (γ1 = 30.0825)

Quantile Statistics

Minimum 5656
5-th Percentile 17609
Q1 25094.75
Median 39894
Q3 76200.25
95-th Percentile 243388.5
Maximum 9.3202e+06
Range 9.3145e+06
IQR 51105.5

Descriptive Statistics

Mean 79177.6386
Standard Deviation 201703.4075
Variance 4.0684e+10
Sum 2.6572e+08
Skewness 30.0825
Kurtosis 1318.6413
Coefficient of Variation 2.5475
  • LargestPropertyUseTypeGFA is not normally distributed (p-value 4.988763339194176e-25)
  • LargestPropertyUseTypeGFA has 354 outliers

SecondLargestPropertyUseType

categorical

Approximate Distinct Count 50
Approximate Unique (%) 3.0%
Missing 1697
Missing (%) 50.3%
Memory Size 124544
  • The largest value (Parking) is over 4.54 times larger than the second largest value (Office)

Length

Mean 9.1775
Standard Deviation 5.6839
Median 7
Minimum 5
Maximum 52

Sample

1st row Parking
2nd row Parking
3rd row Parking
4th row Parking
5th row Parking

Letter

Count 14751
Lowercase Letter 12541
Space Separator 464
Uppercase Letter 2210
Dash Punctuation 79
Decimal Number 12
  • The largest value (parking) is over 4.28 times larger than the second largest value (office)

SecondLargestPropertyUseTypeGFA

numerical

Approximate Distinct Count 1352
Approximate Unique (%) 80.5%
Missing 1697
Missing (%) 50.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 26864
Mean 28444.0758
Minimum 0
Maximum 686750
Zeros 126
Zeros (%) 3.7%
Negatives 0
Negatives (%) 0.0%
  • SecondLargestPropertyUseTypeGFA is skewed right (γ1 = 5.029)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 5000
Median 10664
Q3 26640
95-th Percentile 117338.6
Maximum 686750
Range 686750
IQR 21640

Descriptive Statistics

Mean 28444.0758
Standard Deviation 54392.9179
Variance 2.9586e+09
Sum 4.7758e+07
Skewness 5.029
Kurtosis 36.1905
Coefficient of Variation 1.9123
  • SecondLargestPropertyUseTypeGFA is not normally distributed (p-value 6.355510361136565e-23)
  • SecondLargestPropertyUseTypeGFA has 206 outliers

ThirdLargestPropertyUseType

categorical

Approximate Distinct Count 44
Approximate Unique (%) 7.4%
Missing 2780
Missing (%) 82.3%
Memory Size 45890

Length

Mean 11.9966
Standard Deviation 7.5697
Median 11
Minimum 5
Maximum 52

Sample

1st row Restaurant
2nd row Swimming Pool
3rd row Data Center
4th row Swimming Pool
5th row Office

Letter

Count 6623
Lowercase Letter 5602
Space Separator 374
Uppercase Letter 1021
Dash Punctuation 58
Decimal Number 4

ThirdLargestPropertyUseTypeGFA

numerical

Approximate Distinct Count 501
Approximate Unique (%) 84.1%
Missing 2780
Missing (%) 82.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 9536
Mean 11738.6752
Minimum 0
Maximum 459748
Zeros 48
Zeros (%) 1.4%
Negatives 0
Negatives (%) 0.0%
  • ThirdLargestPropertyUseTypeGFA is skewed right (γ1 = 9.1738)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 2239
Median 5043
Q3 10138.75
95-th Percentile 41654.5
Maximum 459748
Range 459748
IQR 7899.75

Descriptive Statistics

Mean 11738.6752
Standard Deviation 29331.1993
Variance 8.6032e+08
Sum 6.9963e+06
Skewness 9.1738
Kurtosis 113.2214
Coefficient of Variation 2.4987
  • ThirdLargestPropertyUseTypeGFA is not normally distributed (p-value 6.19993594183718e-24)
  • ThirdLargestPropertyUseTypeGFA has 61 outliers

YearsENERGYSTARCertified

categorical

Approximate Distinct Count 65
Approximate Unique (%) 54.6%
Missing 3257
Missing (%) 96.5%
Memory Size 9191
  • The largest value (2016) is over 1.75 times larger than the second largest value (20172016)

Length

Mean 12.2353
Standard Deviation 10.4449
Median 8
Minimum 4
Maximum 60

Sample

1st row 2016
2nd row 2016
3rd row 2014
4th row 2016
5th row 2016

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1456
  • The largest value (2016) is over 1.75 times larger than the second largest value (20172016)

ENERGYSTARScore

numerical

Approximate Distinct Count 100
Approximate Unique (%) 3.9%
Missing 843
Missing (%) 25.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 40528
Mean 67.9187
Minimum 1
Maximum 100
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • ENERGYSTARScore is skewed left (γ1 = -0.859)

Quantile Statistics

Minimum 1
5-th Percentile 12
Q1 53
Median 75
Q3 90
95-th Percentile 99
Maximum 100
Range 99
IQR 37

Descriptive Statistics

Mean 67.9187
Standard Deviation 26.8733
Variance 722.1727
Sum 172038
Skewness -0.859
Kurtosis -0.2215
Coefficient of Variation 0.3957

SiteEUI(kBtu/sf)

numerical

Approximate Distinct Count 1085
Approximate Unique (%) 32.2%
Missing 7
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 53904
Mean 54.7321
Minimum 0
Maximum 834.4
Zeros 16
Zeros (%) 0.5%
Negatives 0
Negatives (%) 0.0%
  • SiteEUI(kBtu/sf) is skewed right (γ1 = 4.9797)

Quantile Statistics

Minimum 0
5-th Percentile 16.98
Q1 27.9
Median 38.6
Q3 60.4
95-th Percentile 146.9
Maximum 834.4
Range 834.4
IQR 32.5

Descriptive Statistics

Mean 54.7321
Standard Deviation 56.2731
Variance 3166.6645
Sum 184392.5001
Skewness 4.9797
Kurtosis 39.9335
Coefficient of Variation 1.0282
  • SiteEUI(kBtu/sf) is not normally distributed (p-value 9.513730079070996e-17)
  • SiteEUI(kBtu/sf) has 264 outliers

SiteEUIWN(kBtu/sf)

numerical

Approximate Distinct Count 1105
Approximate Unique (%) 32.8%
Missing 6
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 53920
Mean 57.0338
Minimum 0
Maximum 834.4
Zeros 29
Zeros (%) 0.9%
Negatives 0
Negatives (%) 0.0%
  • SiteEUIWN(kBtu/sf) is skewed right (γ1 = 4.8254)

Quantile Statistics

Minimum 0
5-th Percentile 17.4
Q1 29.4
Median 40.9
Q3 64.275
95-th Percentile 149.155
Maximum 834.4
Range 834.4
IQR 34.875

Descriptive Statistics

Mean 57.0338
Standard Deviation 57.1633
Variance 3267.6463
Sum 192203.9
Skewness 4.8254
Kurtosis 37.5819
Coefficient of Variation 1.0023
  • SiteEUIWN(kBtu/sf) is not normally distributed (p-value 1.3284986403439675e-15)
  • SiteEUIWN(kBtu/sf) has 251 outliers

SourceEUI(kBtu/sf)

numerical

Approximate Distinct Count 1648
Approximate Unique (%) 48.9%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 134.2328
Minimum 0
Maximum 2620
Zeros 24
Zeros (%) 0.7%
Negatives 0
Negatives (%) 0.0%
  • SourceEUI(kBtu/sf) is skewed right (γ1 = 6.5921)

Quantile Statistics

Minimum 0
5-th Percentile 37.86
Q1 74.7
Median 96.2
Q3 143.9
95-th Percentile 351.67
Maximum 2620
Range 2620
IQR 69.2

Descriptive Statistics

Mean 134.2328
Standard Deviation 139.2876
Variance 19401.0226
Sum 451962
Skewness 6.5921
Kurtosis 77.5477
Coefficient of Variation 1.0377
  • SourceEUI(kBtu/sf) is not normally distributed (p-value 1.061694422331082e-20)
  • SourceEUI(kBtu/sf) has 302 outliers

SourceEUIWN(kBtu/sf)

numerical

Approximate Distinct Count 1694
Approximate Unique (%) 50.3%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 137.7839
Minimum -2.1
Maximum 2620
Zeros 36
Zeros (%) 1.1%
Negatives 1
Negatives (%) 0.0%
  • SourceEUIWN(kBtu/sf) is skewed right (γ1 = 6.5668)

Quantile Statistics

Minimum -2.1
5-th Percentile 37.7
Q1 78.4
Median 101.1
Q3 148.35
95-th Percentile 353.86
Maximum 2620
Range 2622.1
IQR 69.95

Descriptive Statistics

Mean 137.7839
Standard Deviation 139.1098
Variance 19351.5383
Sum 463918.5
Skewness 6.5668
Kurtosis 77.3251
Coefficient of Variation 1.0096
  • SourceEUIWN(kBtu/sf) is not normally distributed (p-value 3.3678972949347213e-19)
  • SourceEUIWN(kBtu/sf) has 295 outliers

SiteEnergyUse(kBtu)

numerical

Approximate Distinct Count 3354
Approximate Unique (%) 99.5%
Missing 5
Missing (%) 0.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 53936
Mean 5.4037e+06
Minimum 0
Maximum 8.7392e+08
Zeros 18
Zeros (%) 0.5%
Negatives 0
Negatives (%) 0.0%
  • SiteEnergyUse(kBtu) is skewed right (γ1 = 24.8309)

Quantile Statistics

Minimum 0
5-th Percentile 491819.9531
Q1 925128.5938
Median 1.8038e+06
Q3 4.2225e+06
95-th Percentile 1.8162e+07
Maximum 8.7392e+08
Range 8.7392e+08
IQR 3.2973e+06

Descriptive Statistics

Mean 5.4037e+06
Standard Deviation 2.1611e+07
Variance 4.6702e+14
Sum 1.8216e+10
Skewness 24.8309
Kurtosis 857.3437
Coefficient of Variation 3.9993
  • SiteEnergyUse(kBtu) is not normally distributed (p-value 4.501458345066686e-25)
  • SiteEnergyUse(kBtu) has 383 outliers

SiteEnergyUseWN(kBtu)

numerical

Approximate Distinct Count 3341
Approximate Unique (%) 99.1%
Missing 6
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 53920
Mean 5.2767e+06
Minimum 0
Maximum 4.7161e+08
Zeros 29
Zeros (%) 0.9%
Negatives 0
Negatives (%) 0.0%
  • SiteEnergyUseWN(kBtu) is skewed right (γ1 = 15.2623)

Quantile Statistics

Minimum 0
5-th Percentile 503320.811
Q1 970182.2344
Median 1.9045e+06
Q3 4.3814e+06
95-th Percentile 1.8203e+07
Maximum 4.7161e+08
Range 4.7161e+08
IQR 3.4112e+06

Descriptive Statistics

Mean 5.2767e+06
Standard Deviation 1.5939e+07
Variance 2.5404e+14
Sum 1.7783e+10
Skewness 15.2623
Kurtosis 334.0071
Coefficient of Variation 3.0206
  • SiteEnergyUseWN(kBtu) is not normally distributed (p-value 5.86210553457195e-25)
  • SiteEnergyUseWN(kBtu) has 381 outliers

SteamUse(kBtu)

numerical

Approximate Distinct Count 131
Approximate Unique (%) 3.9%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 274595.8982
Minimum 0
Maximum 1.3494e+08
Zeros 3237
Zeros (%) 95.9%
Negatives 0
Negatives (%) 0.0%
  • SteamUse(kBtu) is skewed right (γ1 = 26.709)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 1.3494e+08
Range 1.3494e+08
IQR 0

Descriptive Statistics

Mean 274595.8982
Standard Deviation 3.9122e+06
Variance 1.5305e+13
Sum 9.2456e+08
Skewness 26.709
Kurtosis 803.6677
Coefficient of Variation 14.247
  • SteamUse(kBtu) is not normally distributed (p-value 4.243604056134355e-25)
  • SteamUse(kBtu) has 130 outliers

Electricity(kWh)

numerical

Approximate Distinct Count 3352
Approximate Unique (%) 99.6%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 1.0866e+06
Minimum -33826.8008
Maximum 1.9258e+08
Zeros 14
Zeros (%) 0.4%
Negatives 1
Negatives (%) 0.0%
  • Electricity(kWh) is skewed right (γ1 = 28.7157)

Quantile Statistics

Minimum -33826.8008
5-th Percentile 72675.6398
Q1 187422.9453
Median 345129.9063
Q3 829317.8438
95-th Percentile 3.9442e+06
Maximum 1.9258e+08
Range 1.9261e+08
IQR 641894.8984

Descriptive Statistics

Mean 1.0866e+06
Standard Deviation 4.3525e+06
Variance 1.8944e+13
Sum 3.6587e+09
Skewness 28.7157
Kurtosis 1155.7789
Coefficient of Variation 4.0055
  • Electricity(kWh) is not normally distributed (p-value 4.515135763332271e-25)
  • Electricity(kWh) has 391 outliers

Electricity(kBtu)

numerical

Approximate Distinct Count 3351
Approximate Unique (%) 99.5%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 3.7076e+06
Minimum -115417
Maximum 6.5707e+08
Zeros 14
Zeros (%) 0.4%
Negatives 1
Negatives (%) 0.0%
  • Electricity(kBtu) is skewed right (γ1 = 28.7157)

Quantile Statistics

Minimum -115417
5-th Percentile 247969.2
Q1 639487
Median 1.1776e+06
Q3 2.8296e+06
95-th Percentile 1.3458e+07
Maximum 6.5707e+08
Range 6.5719e+08
IQR 2.1901e+06

Descriptive Statistics

Mean 3.7076e+06
Standard Deviation 1.4851e+07
Variance 2.2054e+14
Sum 1.2484e+10
Skewness 28.7157
Kurtosis 1155.7789
Coefficient of Variation 4.0055
  • Electricity(kBtu) is not normally distributed (p-value 4.515135763332271e-25)
  • Electricity(kBtu) has 391 outliers

NaturalGas(therms)

numerical

Approximate Distinct Count 2109
Approximate Unique (%) 62.6%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 13685.0454
Minimum 0
Maximum 2.9791e+06
Zeros 1258
Zeros (%) 37.3%
Negatives 0
Negatives (%) 0.0%
  • NaturalGas(therms) is skewed right (γ1 = 30.0255)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 3237.5376
Q3 11890.335
95-th Percentile 49023.7242
Maximum 2.9791e+06
Range 2.9791e+06
IQR 11890.335

Descriptive Statistics

Mean 13685.0454
Standard Deviation 67097.8083
Variance 4.5021e+09
Sum 4.6078e+07
Skewness 30.0255
Kurtosis 1199.2479
Coefficient of Variation 4.903
  • NaturalGas(therms) is not normally distributed (p-value 4.41418684797157e-25)
  • NaturalGas(therms) has 336 outliers

NaturalGas(kBtu)

numerical

Approximate Distinct Count 2109
Approximate Unique (%) 62.6%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 1.3685e+06
Minimum 0
Maximum 2.9791e+08
Zeros 1258
Zeros (%) 37.3%
Negatives 0
Negatives (%) 0.0%
  • NaturalGas(kBtu) is skewed right (γ1 = 30.0255)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 323754
Q3 1.189e+06
95-th Percentile 4.9024e+06
Maximum 2.9791e+08
Range 2.9791e+08
IQR 1.189e+06

Descriptive Statistics

Mean 1.3685e+06
Standard Deviation 6.7098e+06
Variance 4.5021e+13
Sum 4.6078e+09
Skewness 30.0255
Kurtosis 1199.2479
Coefficient of Variation 4.903
  • NaturalGas(kBtu) is not normally distributed (p-value 4.41418684797157e-25)
  • NaturalGas(kBtu) has 336 outliers

DefaultData

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 236207
  • The largest value (False) is over 28.88 times larger than the second largest value (True)

Length

Mean 4.9665
Standard Deviation 0.1799
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 16767
Lowercase Letter 13391
Space Separator 0
Uppercase Letter 3376
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (False, True) take over 50.0%
  • The largest value (false) is over 28.88 times larger than the second largest value (true)

Comments

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 229568

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row nan
2nd row nan
3rd row nan
4th row nan
5th row nan

Letter

Count 10128
Lowercase Letter 10128
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • Comments has words of constant length

ComplianceStatus

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 252164
  • The largest value (Compliant) is over 28.42 times larger than the second largest value (Error - Correct Default Data)

Length

Mean 9.6931
Standard Deviation 3.4383
Median 9
Minimum 9
Maximum 28

Sample

1st row Compliant
2nd row Compliant
3rd row Compliant
4th row Compliant
5th row Compliant

Letter

Count 32107
Lowercase Letter 28340
Space Separator 467
Uppercase Letter 3767
Dash Punctuation 150
Decimal Number 0
  • The top 2 categories (Compliant, Error - Correct Default Data) take over 50.0%
  • The largest value (compliant) is over 25.09 times larger than the second largest value (data)

Outlier

categorical

Approximate Distinct Count 2
Approximate Unique (%) 6.2%
Missing 3344
Missing (%) 99.1%
Memory Size 2441
  • The largest value (Low outlier) is over 2.56 times larger than the second largest value (High outlier)

Length

Mean 11.2812
Standard Deviation 0.4568
Median 11
Minimum 11
Maximum 12

Sample

1st row High outlier
2nd row Low outlier
3rd row Low outlier
4th row High outlier
5th row Low outlier

Letter

Count 329
Lowercase Letter 297
Space Separator 32
Uppercase Letter 32
Dash Punctuation 0
Decimal Number 0

TotalGHGEmissions

numerical

Approximate Distinct Count 2818
Approximate Unique (%) 83.7%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 119.724
Minimum -0.8
Maximum 16870.98
Zeros 9
Zeros (%) 0.3%
Negatives 1
Negatives (%) 0.0%
  • TotalGHGEmissions is skewed right (γ1 = 19.4732)

Quantile Statistics

Minimum -0.8
5-th Percentile 3.78
Q1 9.495
Median 33.92
Q3 93.94
95-th Percentile 392.797
Maximum 16870.98
Range 16871.78
IQR 84.445

Descriptive Statistics

Mean 119.724
Standard Deviation 538.8322
Variance 290340.1683
Sum 403110.61
Skewness 19.4732
Kurtosis 474.1855
Coefficient of Variation 4.5006
  • TotalGHGEmissions is not normally distributed (p-value 4.7085410297132765e-25)
  • TotalGHGEmissions has 367 outliers

GHGEmissionsIntensity

numerical

Approximate Distinct Count 511
Approximate Unique (%) 15.2%
Missing 9
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 53872
Mean 1.1759
Minimum -0.02
Maximum 34.09
Zeros 12
Zeros (%) 0.4%
Negatives 1
Negatives (%) 0.0%
  • GHGEmissionsIntensity is skewed right (γ1 = 5.5907)

Quantile Statistics

Minimum -0.02
5-th Percentile 0.13
Q1 0.21
Median 0.61
Q3 1.37
95-th Percentile 3.961
Maximum 34.09
Range 34.11
IQR 1.16

Descriptive Statistics

Mean 1.1759
Standard Deviation 1.8215
Variance 3.3177
Sum 3959.31
Skewness 5.5907
Kurtosis 57.2852
Coefficient of Variation 1.549
  • GHGEmissionsIntensity is not normally distributed (p-value 2.2440711870042847e-21)
  • GHGEmissionsIntensity has 263 outliers

Interactions

Correlations

Missing Values